General Audio Transcription Training

5. What Audio Formats are Used to Collect Audio/Video

 

Here we will teach more about the devices used in the Voicescriber method to gather the recordings. The two methods used today are Tape and Digital.

We will not go over tape because tape recording are pretty much a thing of the past. So we will spend the time going over digital audio files.

Recording Formats
Most popular recorders use a single track of audio. Some of them have two speeds at which the audio can be recorded. Recording on the fastest speed produces higher quality dictation, but provides less recording time on the digital tape or digital file.

Multiple-track recorders are typically used in settings that require very accurate transcriptions and have multiple persons that might speak simultaneously. For instance, courtroom transcripts are often taken by a four-track recorder with each person wearing a separate microphone and recording on a different track of the tape: judge, two lawyers, witness.

Multiple-track recorders are rare outside of the courtroom setting. However, they provide superior transcripts because the transcriber allows one to listen to each track individually or all tracks at once.

At this time you should not concern yourself too much about cassette recordings because, as already mentioned these are a thing of the past.

Digital Audio Files
As mentioned earlier, the latest way of transcribing is from digital files, such as MP3 or MP4. In the next few years almost all transcription work will be through computers, using specific file formats on the computer. This is why now is the best time to take this opportunity to learn this new way of transcribing.

Many of the large transcription companies who have been using the old methods such as Dictaphone and tape transcription are struggling to get up-to-date with the newest technology. This is why we will focus this training on the latest and greatest transcribing – DIGITAL!

With the advent of multimedia computers (audio, video, etc.), more material is being generated in the form of digital computer files. Digital hand held dictation devices are now available that record to a memory card and can generate audio files you can place on disk or send over the Internet. You will have the ability to transcribe such files that come in a variety of formats.

We are only going to teach you the terms and some brief definitions. DON’T get overwhelmed with this information or feel you need to become an expert on computer audio formats. We are only providing you with terminology so if someone refers to certain terms, you will know they are referring to an audio format. You will not be expected to supply technical information on these terms.

JUST A LITTLE EDUCATION NEVER HURT ANYONE!

Some of the existing formats for digital audio files are:

File Extension Creation Company Description
.3gp multimedia container format can contain proprietary formats asAMRAMR-WB or AMR-WB+, but also some open formats
.act ACT is a lossy ADPCM 8 kbit/s compressed audio format recorded by most Chinese MP3 and MP4 players with a recording function, and voice recorders
.aiff Apple standard audio file format used by Apple. It could be considered the Apple equivalent of wav.
.aac the Advanced Audio Coding format is based on the MPEG-2 andMPEG-4 standards. aac files are usually ADTS or ADIF containers.
.amr AMR-NB audio, used primarily for speech.
.au Sun Microsystems the standard audio file format used by SunUnix and Java. The audio in au files can be PCM or compressed with the μ-lawa-law orG729 codecs.
.awb AMR-WB audio, used primarily for speech, same as the ITU-T‘s G.722.2 specification.
.dct NCH Software A variable codec format designed for dictation. It has dictation header information and can be encrypted (as may be required by medical confidentiality laws). A proprietary format of NCH Software.
.dss Olympus dss files are an Olympus proprietary format. It is a fairly old and poor codec. Gsm or mp3 are generally preferred where the recorder allows. It allows additional data to be held in the file header.
.dvf Sony a Sony proprietary format for compressed voice files; commonly used by Sony dictation recorders.
.flac File format for the Free Lossless Audio Codec, a lossless compression codec.
.gsm designed for telephony use in Europe, gsm is a very practical format for telephone quality voice. It makes a good compromise between file size and quality. Note that wav files can also be encoded with the gsm codec.
.iklax iKlax An iKlax Media proprietary format, the iKlax format is a multi-track digital audio format allowing various actions on musical data, for instance on mixing and volumes arrangements.
.ivs 3D Solar UK Ltd A proprietary version with Digital Rights Management developed by 3D Solar UK Ltd for use in music downloaded from their Tronme Music Store and interactive music and video player.
.m4a An audio-only MPEG-4 file, used by Apple for unprotected music downloaded from their iTunes Music Store. Audio within the m4a file is typically encoded with AAC, although lossless ALAC may also be used.
.m4p Apple A version of AAC with proprietary Digital Rights Managementdeveloped by Apple for use in music downloaded from their iTunesMusic Store.
.mmf Samsung a Samsung audio format that is used in ringtones.
.mp3 MPEG Layer III Audio. Is the most common sound file format used today.
.mpc Musepack or MPC (formerly known as MPEGplus, MPEG+ or MP+) is an open source lossy audio codec, specifically optimized fortransparent compression of stereo audio at bitrates of 160–180 kbit/s.
.msv Sony a Sony proprietary format for Memory Stick compressed voice files.
.ogg, .oga Xiph.Org Foundation a free, open source container format supporting a variety of formats, the most popular of which is the audio format Vorbis. Vorbis offers compression similar to MP3 but is less popular.
.opus Internet Engineering Task Force a lossy audio compression format developed by the Internet Engineering Task Force (IETF) and made especially suitable for interactive real-time applications over the Internet. As an open format standardised through RFC 6716, a reference implementation is provided under the 3-clause BSD license.
.ra, .rm RealNetworks RealAudio format designed for streaming audio over the Internet. The .ra format allows files to be stored in a self-contained fashion on a computer, with all of the audio data contained inside the file itself.
.raw a raw file can contain audio in any format but is usually used with PCM audio data. It is rarely used except for technical tests.
.sln Signed Linear PCM format used by Asterisk. Prior to v.10 the standard formats were 16-bit Signed Linear PCM sampled at 8kHz and at 16kHz. With v.10 many more sampling rates were added.[7]
.tta The True Audio, real-time lossless audio codec.
.vox the vox format most commonly uses the Dialogic ADPCM (Adaptive Differential Pulse Code Modulation) codec. Similar to other ADPCM formats, it compresses to 4-bits. Vox format files are similar to wave files except that the vox files contain no information about the file itself so the codec sample rate and number of channels must first be specified in order to play a vox file.
.wav standard audio file container format used mainly in Windows PCs. Commonly used for storing uncompressed (PCM), CD-quality sound files, which means that they can be large in size—around 10 MB per minute. Wave files can also contain data encoded with a variety of (lossy) codecs to reduce the file size (for example the GSM or MP3 formats). Wav files use a RIFF structure.
.wma Microsoft Windows Media Audio format, created by Microsoft. Designed withDigital Rights Management (DRM) abilities for copy protection.
.wv format for wavpack filehttp://www.wavpack.com/flash/wavpack.htm

 

The latest MP4 technology – This is now the most common transcription format.
Since the latest technological advances have resulted in using MP4 and digital recordings when doing interviews, we will explain why researchers (clients needing transcription work) are switching to digital MP4 technology, and how to use this technology for transcribing. This is a huge plus when it comes to applying for online assignments from these companies.

You will have the ability to present current credentials of knowledge of MP4 and digital transcribing. We can tell you that digital recordings are by far the clearest and most easily transcribed recordings for any of the scores of transcription projects you will work on.

What is MP4 and what does it have to do with research and transcription?

A very helpful and clear description of these recording methods is given by Nicholas Sheon. Analog recorders store sound as a magnetic coating on thin plastic tape, usually housed inside a cassette. Digital recordings store sound as a series of numbers; each number represents a sample or cross section of the continuous sound vibration. In order to fool the ear into hearing a continuous sound wave, digital recordings capture thousands of sounds per second. This results in the rather large computer files.

For example, music stored on CDs that we buy at the record store consists of large .WAV files of about 10 megabytes per minute of music. These .WAV files are so large that they quickly become unwieldy when stored on a computer. MP4 is a standard way to compress these large digital audio files.

The growth of the Internet and increasingly powerful home computers have made MP3 a very popular format for storing, organizing, and distributing music files over computer networks. MP3 files sound very similar in quality to the standard .WAV format that comes on a music CD, but MP4 compression software removes data from the original audio that your ears cannot really hear, such as very high or low frequencies.

By removing these frequencies and compressing what is left, MP4 compression software can reduce the size of audio files to one tenth of their original size before you can really notice a difference in audio quality. This compression process is often called “ripping” a CD. You could compress MP3 files even smaller, and reduce the number of samples per second, or bit rate.
Why use digital audio for data analysis?

This is why researchers who frequently use transcription services are switching to MP4 formats instead of the previous analog cassette. The MP4 format not only sounds a lot better than traditional audio-cassettes, but it is easier to store and takes less time to duplicate and transfer to a computer.

When a researcher records interviews or counseling sessions using MP4, it results in a much clearer and better sound which means less time and fatigue involved in transcription and analysis. Gone is the sense that you are listening to your informant’s voice through a wall of static and rumbling bass that is typical with analog tape recorders. MP4 files can be stored on an MP4 player, a CD-R, or a computer hard drive.

For example, if a standard music CD can hold 80 minutes of music in .WAV format, a researcher can fit ten times more, i.e. 800 minutes or 13 hours of MP4 audio onto one CD-R. CD-R’s, and digital files can be transferred onto CD at very high rates without any loss of sound quality. Having multiple backup copies ensures they won’t lose their data and that they have it with them when they need it.

Each time an analog tape is dubbed, e.g., to send to a transcriptionist, some of the audio quality is lost with each copy. This is not a problem with copies of digital files because each copy is essentially identical to the original. Digital files can also facilitate data security. CD-R’s and computer folders containing MP4 files can be password protected or even encrypted so that only the researcher can access the audio data.

The researcher can also “bleep” out identifying information from an interview (e.g. “My boyfriend’s name is bleeeeep.He lives on bleeeep Street”.)

So if this is a medical researcher they can edit out personal health information so that their project complies with HIPAA regulations. They can also change the pitch of a voice while retaining the speed of the recording, effectively altering the sound a voice, without slowing it down. This is useful for presentations where you want to play a segment of audio but make the voice unrecognizable.

Besides the better sound quality and ease of duplication and storage already mentioned, digital audio has some distinct advantages over analog tapes (micro cassettes). Because MP4 files are recorded at a fixed bit rate, MP4 files always play back at the same speed with which they were recorded.

Changes in pitch are a common problem with tapes, in that they may record at one speed on one tape machine and then when played on another machine, they will play either faster or slower. The resulting change in playback speed alters the pitch and the feel of the voice.

Even the slightest drop in pitch or speed can lead a transcriber to interpret what they are hearing differently. Slower playback makes them sound drunk, and faster playback makes them sound nervous or on speed. Although, MP3 files record and play at a fixed speed, playback speed can be manipulated to make a recording play at a slower or faster tempo while preserving the pitch.

This is very useful for transcription. Another advantage of a constant playback and record rate is the ability to precisely index or bookmark a selection in an audio file.

For example, a researcher has noted in their transcript that there is an unintelligible utterance at 2 minutes and 32.457 seconds in the recording. They want to ask someone else to listen to it and see if they can make sense of it. If the researcher gives them a tape, they will have trouble finding it because the other person’s tape player may play at a different speed; what happens at 2:32.457 on one person’s machine may be different than with another tape player.

This has been a problem for years with researchers sending cassette tapes to transcribers. However, this segment will always be found at a precise time on a digital recording, no matter what computer or MP4 player you are using to listen to it. The other big advantage is that a researcher could store their audio data on a server or in computer files so a transcriptionist could download it.

They can also play segments of their data in a presentation or make it available on the web so that readers can hear what the transcript is attempting to represent. The difficulty of working with cassettes for analysis has led many qualitative researchers to rely heavily on text transcriptions as a basis for their analysis.

This reflects a particular view of spoken language as merely a window into concepts, beliefs, and experiences that are seen to exist in the mind of the speaker. According to this view, there is little difference between written words and the spoken words they represent.

This is a bit like saying a dead butterfly specimen pinned down in a display case is equivalent to watching a live specimen fluttering and interacting in its environment. There is so much detail that tends to be lost when analog recording technology is used or when spoken discourse is transcribed, i.e. translated into written text.

Certain software programs used for qualitative analysis, like Transcriber (which we will be giving you for free a little later), will allow you to synchronize transcripts to a digital audio file, much like a Karaoke machine. Such synchronizing and linking is not possible with cassette tapes. Finally, if you want to measure how long a particular phenomenon takes, digital audio and video allow you to precisely measure the temporal unfolding of social interaction.

For example, if a researcher wants to measure and tabulate the amount of time counselors spend discussing various issues with their clients, by indexing the counseling sessions in this way, they can also retrieve related segments for further comparison and analysis. They can even create time lines that chart the sequence and distribution of various communication formats over the course of the session.

What can be done with Audio files to enhance the recording?

Digital audio files can be enhanced either to improve poor-quality sound or by adding various special effects.

* Uneven speaker volumes can be adjusted so that low volume speakers can be heard.

* One speaker can be increased/decreased in volume to generate a sense of distance or depth.

* Many constant background noises can be eliminated without distorting the speech.

* A large number of special effects can be added to all or parts of the recording.
Such services are typically charged additionally at an hourly rate

About audio files from video
Video may now refer to videotape or electronic video files. Digital audio can usually be extracted from video files and transcribed as noted above. Videotape transcription requires making an intermediate audiotape that can be more easily transcribed.

You also will be able to transcribe audio from any source on the World Wide Web if you can access it with a standard browser or program. You could transcribe an online video or podcast directly online.

next